input channel
Bilinear Attention Networks
Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.
Switching control of underactuated multi-channel systems with input constraints for cooperative manipulation
Lee, Dongjae, Dimarogonas, Dimos V., Kim, H. Jin
Abstract--This work presents an event-triggered switching control framework for a class of nonlinear underactuated multi-channel systems with input constraints. These systems are inspired by cooperative manipulation tasks involving underactua-tion, where multiple underactuated agents collaboratively push or pull an object to a target pose. T o simultaneously account for channel assignment, input constraints, and stabilization, we formulate the control problem as a Mixed Integer Linear Programming and derive sufficient conditions for its feasibility. T o improve real-time computation efficiency, we introduce an event-triggered control scheme that maintains stability even between switching events through a quadratic programming-based stabilizing controller . We theoretically establish the semi-global exponential stability of the proposed method and the asymptotic stability of its extension to nonprehensile cooperative manipulation under noninstantaneous switching. The proposed framework is further validated through numerical simulations on 2D and 3D free-flyer systems and multi-robot nonprehensile pushing tasks. Cooperative tasks involving objects that are collectively controlled by multiple agents such as drone swarms and robotic arms in manufacturing rely on precise object manipulation.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
Bilinear Attention Networks
Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Canada > Quebec > Montreal (0.04)
A Appendix
All CPU experiments are conducted on A WS C5.9xlarge instances with Intel Xeon Platinum 8124M Take TensorCore GPUs as an example. MetaSchedule makes an orthogonal contribution as it is a probabilistic language for composable search space construction rather than speeding up tuning. From frontend frameworks, for example, TensorFlow, PyTorch, or JAX, the tensor program to be optimized is generated from their computational graph. A.7 A vailable Transformations Primitives 17 Transformation Explanation split Split a loop into a sequence of consecutive loops fuse Fuse a sequence of consecutive loops into one reorder Reorder a sequence of loops parallel Parallelize a loop across CPU cores vectorize V ectorize a loop with SIMD unroll Unroll a loop bind Bind a loop to a GPU thread cache-read Create a block that reads a buffer region into a read cache cache-write Create a block that writes a buffer region into a write cache compute-at Move a producer block under the specific loop compute-inline Inline a block into its consumer(s) rfactor Factorize an associative reduction block by the specified loop storage-align Set alignment requirement for specific dimension of a buffer set-scope Set the storage scope of a buffer add-unit-loop Create a new unit loop on top of the specific block re-index
Channel Gating Neural Networks
Weizhe Hua, Yuan Zhou, Christopher M. De Sa, Zhiru Zhang, G. Edward Suh
Unlike static network pruning, channel gating optimizes CNN inference at run-time by exploiting input-specific characteristics, which allows substantially reducing the compute cost with almost no accuracy loss. We experimentally show that applying channel gating in state-of-the-art networks achieves 2.7-8.0
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
Appendices A Bernoulli-CRS Properties Let us define K R
First, we show that the above holds in expectation: Proposition 1. E null A Its expectation is controlled through the parameter k: Proposition 2. E [T ] = k . Let us further derive the properties of the proposed sampling algorithm. For notation simplicity, we assume zero padding. This formulation immediately hints at the possibility to sample over the input channel dimension, similarly to sampling column-row pairs in matrices. Figure 2 illustrates the sampling operation.
Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction Appendix
In the following, we provide the Appendix as part of the supplementary material to the main paper. Section C contains additional content about the model zoos. We also provide visualizations of some of the properties of our model zoo for better intuition. Consider a common, fully-connected feed-forward neural network (FFN). Training of neural networks is defined as an optimization against a objective function on a given dataset, i.e. their weights and biases are chosen to minimize a cost function, usually called loss, denoted by Subsequent earlier layer's error are computed with δ L, (6) where β is a positive learning rate.
Few-Shot Audio-Visual Learning of Environment Acoustics Supplementary Material
Moreover, we qualitatively demonstrate our model's prediction quality by Please use headphones to hear the spatial audio correctly. As we can see, the prediction error tends to be small when the source is relatively close to the receiver, or there are no major obstacles along the path connecting them. We show two scenes and two examples per scene. For our experiment with ambient environment sounds (Sec. We will publish the link to our datasets on our project page. Here, we provide our architecture and additional training details for reproducibility.